Pose Constraints for Consistent Self-supervised Monocular Depth and Ego-Motion

نویسندگان

چکیده

Self-supervised monocular depth estimation approaches suffer not only from scale ambiguity but also infer temporally inconsistent maps w.r.t. scale. While disambiguating during training is possible without some kind of ground truth supervision, having consistent predictions would make it to calculate once inference as a post-processing step and use over-time. With this goal, set temporal consistency losses that minimize pose inconsistencies over time are introduced. Evaluations show introducing these constraints reduces improves the baseline performance ego-motion prediction.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints

We present a novel approach for unsupervised learning of depth and ego-motion from monocular video. Unsupervised learning removes the need for separate supervisory signals (depth or ego-motion ground truth, or multi-view video). Prior work in unsupervised depth learning uses pixel-wise or gradient-based losses, which only consider pixels in small local neighborhoods. Our main contribution is to...

متن کامل

Self-Supervised Monocular Image Depth Learning and Confidence Estimation

Convolutional Neural Networks (CNNs) need large amounts of data with ground truth annotation, which is a challenging problem that has limited the development and fast deployment of CNNs for many computer vision tasks. We propose a novel framework for depth estimation from monocular images with corresponding confidence in a selfsupervised manner. A fully differential patch-based cost function is...

متن کامل

DeMoN: Depth and Motion Network for Learning Monocular Stereo

Our network is a chain of encoder-decoder networks. Figures 12 and 13 explain the details of the two encoderdecoders used in the bootstrap and iterative net part. Fig. 14 gives implementation details for the refinement net. The encoder-decoders for the bootstrap and iterative net use additional inputs which come from previous predictions. Some of these inputs, like warped images or depth from o...

متن کامل

DeMoN: Depth and Motion Network for Learning Monocular Stereo

Our network is a chain of encoder-decoder networks. Figures 15 and 16 explain the details of the two encoderdecoders used in the bootstrap and iterative net part. Fig. 17 gives implementation details for the refinement net. The encoder-decoders for the bootstrap and iterative net use additional inputs which come from previous predictions. Some of these inputs, like warped images or depth from o...

متن کامل

DeMoN: Depth and Motion Network for Learning Monocular Stereo

Our network is a chain of encoder-decoder networks. Figures 15 and 16 explain the details of the two encoderdecoders used in the bootstrap and iterative net part. Fig. 17 gives implementation details for the refinement net. The encoder-decoders for the bootstrap and iterative net use additional inputs which come from previous predictions. Some of these inputs, like warped images or depth from o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2023

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-031-31438-4_23